Unified store queue for reducing linear aliasing effects

ABSTRACT

Embodiments herein provide for improved store-to-load-forwarding (STLF) logic and linear aliasing effect reduction logic. In one embodiment, a load instruction to be executed is selected. Whether a first linear address associated with said load instruction matches a linear address of a store instruction of a plurality of store instructions in a queue is determined. Data associated with said store instruction for executing said load instruction is forwarded, in response to determining that the first linear address matches the linear address of the store instruction.

FIELD OF THE DISCLOSURE

This application relates generally to processing systems, and, moreparticularly, to organizing store queue entries in processing systems.

DESCRIPTION OF RELATED ART

Processors generally use memory operations to move data to and frommemory. The term “memory operation” refers to an operation thatspecifies a transfer of data between a processor and memory (or cache).Load memory operations specify a transfer of data from memory to theprocessor, and store memory operations specify a transfer of data fromthe processor to memory.

Processing systems generally utilize two basic memory accessinstructions: a store instruction that writes information from aregister to a memory location and a load instruction that readsinformation out of a memory location and loads the information into aregister. Store and load instructions typically operate on memorylocations in one or more caches associated with the processor. Valuesfrom store instructions are not committed to the memory system (e.g.,the caches) immediately after execution of the store instruction.Instead, the store instructions, including the memory address and storedata, are buffered in a store queue so they can be written in-order.Eventually, the store commits and the buffered data is written to thememory system. Buffering store instructions can be used to help reorderstore instructions so that they can commit in order. However, bufferingstore instructions can introduce other complications. For example, aload instruction can read an old, out-of-date value from a memoryaddress if a store instruction executes and buffers data for the samememory address in the store queue and the load attempts to read thememory value before the store instruction has retired.

A technique called store-to-load forwarding can provide data directlyfrom the store queue to a requesting load. For example, the store queuecan forward data from completed but not-yet-committed (“in-flight”)store instructions to later (younger) load instructions. The store queuein this case functions as a Content-Addressable Memory (CAM) that can besearched using the memory address instead of a simple FIFO queue. Whenstore-to-load forwarding is implemented, each load instruction searchesthe store queue for in-flight store instructions to the same address.The load instruction can obtain the requested data value from a matchingstore instruction that is logically earlier in program order (i.e.older). If there is no matching store instruction, the load instructioncan access the memory system to obtain the requested value as long asany preceding matching store instructions have been retired and havecommitted their values to the memory.

Some state-of-the-art store queue allocation policies may result indelays that can degrade performance of the system. For example, a storeinstruction may need to wait until it is eligible to execute, isallocated a store entry that can hold data, and receives the data beforeperforming store-to-load forwarding (STLF) to a load instruction with amatching address. Once the store entry has been allocated, the storeinstruction may be eligible to perform STLF of the received data to thematching load instruction. For another example, STLF may be delayed if astore instruction is waiting for the results of another operation, i.e.,the store data is dependent upon another operation. The storeinstruction waits until the operation has completed and, once theoperation has completed and the store instruction has received theresults, the store instruction sends a wake-up signal to the loadinstruction so that STLF may be performed from the store instruction tothe load instruction. Moreover, in conventional schemes using STLF,linear aliasing may occur, which refers to multiple linear addressesmapping to the same physical address. In this case, incorrect data maybe forwarded during the storing process, leading to errors.

SUMMARY OF EMBODIMENTS

The following presents a simplified summary of the disclosed subjectmatter in order to provide a basic understanding of some aspects of thedisclosed subject matter. This summary is not an exhaustive overview ofthe disclosed subject matter. It is not intended to identify key orcritical elements of the disclosed subject matter or to delineate thescope of the disclosed subject matter. Its sole purpose is to presentsome concepts in a simplified form as a prelude to the more detaileddescription that is discussed later.

Some embodiments may include a method for reducing an effect of linearaliasing problems. A load for executing an instruction is selected. Theload is associated with one of a plurality of stores in a queue. Anorder of a plurality of stores in the queue is determined based upon apointer associated with the queue. At least a subset of a first linearaddress associated with the execution of the instruction is compared tocorresponding subsets of linear addresses of the plurality of stores inthe queue, based upon the pointer. The first linear address associatedwith the execution of the instruction is compared to a linear address ofa matching store, in response to a determination that the subset of thefirst linear address matches the corresponding subset of the linearaddress of a matching store. A linear aliasing effect reduction processis performed in response to a determination that the full first linearaddress matches the linear address of the matching store. The dataassociated with the matching store is forwarded for executing theinstruction in response to the comparing the first linear address, andthe linear aliasing effect reduction process.

Some embodiments may include a method that include providing a firstlinear address to a unified store queue (USQ) comprising a plurality ofstores, for executing an instruction and determining an order of aplurality of stores in the USQ based upon a pointer associated with thequeue. The method also includes comparing at least a first subset of thefirst linear address to the corresponding linear addresses of aplurality of loads in the queue, based upon the pointer, for determininga matching store; comparing the first linear address to the linearaddress of the matching store to determine whether a full matching storeis present, in response to a determination that the subset of the linearaddress matched the corresponding subset of the linear address of thematching store; comparing the physical address of the matching store tothe physical addresses of a plurality of stores of the USQ to determinewhether there is a physical address match in response to determining thefull matching store is present; and comparing a second subset of thelinear address of the matching store to a corresponding subset of aplurality of stores of the USQ to determine whether there is a linearaddress match. The method further includes determining that a linearalias is present in response to a determination that there is a physicaladdress match and a linear address mismatch; and blocking data from thematching store from being used for executing the instruction in responseto a determination that the linear alias is present.

Some embodiments provide an integrated circuit that includes a processorfor executing an instruction. The processor comprises a queue unitconfigured to: receive a linear address for executing the instruction;compare at least a subset of a first linear address associated with theinstruction to corresponding subsets of linear addresses of a pluralityof stores to determine whether a matching store is present; compare thefull first linear address associated with the instruction tocorresponding full linear addresses the matching store in response to adetermination that the subset of the first linear address matches thecorresponding subset of the linear address of at least one other storeof the plurality; and perform a linear aliasing effect reduction processin response to a determination that the full first linear addressmatches the linear address of the matching load, wherein the linearaliasing effect reduction process comprises comparing the physicaladdress of the matching store to the physical addresses of a pluralityof stores to determine whether there is a physical location match,comparing a second subset of a plurality of stores of the queue todetermine whether there is a second subset mismatch, and blockingforwarding of the load associated with the matching store in response tothe presence of a physical location match and the mismatch.

Some embodiments provide an integrated circuit that includes a processorfor executing an instruction; a unified store queue (USQ) unit forproviding at least one pointer for a plurality of stores; and a store toload forward (STLF) logic. The STLF logic is configured to receive alinear address for executing the instruction; compare at least a firstsubset of a linear address associated with the instruction tocorresponding first subsets of linear addresses of a plurality of storesin the USQ to locate a matching store; compare the full linear addressto the full linear address of the matching store; and perform a linearaliasing reduction process in response to a determination that the fulllinear address matches the linear address of the matching store.

Some embodiments provide a non-transitory computer-readable mediumstoring instructions executable by at least one processor for tofabricating an integrated circuit device. The integrated circuit deviceis capable of reducing an effect of linear aliasing problems. Theintegrated circuit device includes a processor for executing aninstruction; a unified store queue (USQ) unit for providing at least onepointer for a plurality of stores; and a store to load forward (STLF)logic. The STLF logic is configured to receive a linear address forexecuting the instruction; compare at least a first subset of a linearaddress associated with the instruction to corresponding first subsetsof linear addresses of a plurality of stores in the USQ to locate amatching store; compare the full linear address to the full linearaddress of the matching store; and perform a linear aliasing reductionprocess in response to a determination that the full linear addressmatches the linear address of the matching store.

BRIEF DESCRIPTION OF THE DRAWINGS

The disclosed subject matter may be understood by reference to thefollowing description taken in conjunction with the accompanyingdrawings, in which like reference numerals identify like elements, andin which:

FIG. 1 conceptually illustrates a computer system, according to someembodiments;

FIG. 2 provides a representation of a CPU, depicted in FIG. 1, inaccordance with some embodiments;

FIG. 3A conceptually illustrates an example of unified store queueentries, according to some embodiments;

FIG. 3B conceptually illustrates one example of a store queue such asthe store queue shown in FIG. 2, according to some embodiments;

FIG. 4 conceptually illustrates an example of a computer system thatincludes result buses and scheduling buses, according to someembodiments;

FIG. 5 provides a representation of a processor depicted in FIG. 1, inaccordance with some embodiments;

FIG. 6A provides a representation of a silicon die/chip that includesone or more circuits as shown in FIG. 3, in accordance with someembodiments;

FIG. 6B provides a representation of a silicon wafer which includes oneor more dies/chips that may be produced in a fabrication facility, inaccordance with some embodiments;

FIG. 7 is a flowchart of a method relating to a store operation, inaccordance with some embodiments;

FIG. 8 is a flowchart a method relating to a linear alias reductionprocess, in accordance with some embodiments.

While the disclosed subject matter may be modified and may takealternative forms, specific embodiments thereof have been shown by wayof example in the drawings and are herein described in detail. It shouldbe understood, however, that the description herein of specificembodiments is not intended to limit the disclosed subject matter to theparticular forms disclosed, but on the contrary, the intention is tocover all modifications, equivalents, and alternatives falling withinthe scope of the appended claims.

DETAILED DESCRIPTION

Illustrative embodiments are described below. In the interest ofclarity, not all features of an actual implementation are described inthis specification. It should be appreciated that in the development ofany such actual embodiment, numerous implementation-specific decisionsshould be made, which may vary from one implementation to another.Moreover, it should be appreciated that such a development effort mightbe complex and time-consuming, but would nevertheless be a routineundertaking for those of ordinary skill in the art having the benefit ofthis disclosure. The description and drawings merely illustrate theprinciples of the claimed subject matter. It should thus be appreciatedthat those skilled in the art may be able to devise various arrangementsthat, although not explicitly described or shown herein, embody theprinciples described herein and may be included within the scope of theclaimed subject matter. Furthermore, all examples recited herein areprincipally intended to be for pedagogical purposes to aid the reader inunderstanding the principles of the claimed subject matter and theconcepts contributed by the inventor(s) to furthering the art, and areto be construed as being without limitation to such specifically recitedexamples and conditions. The word “exemplary” is intended to serve asone example and not to limit the application by construing the exampleor embodiment as preferred or advantageous over other embodiments.

The disclosed subject matter is described with reference to the attachedfigures. Various structures, systems and devices are schematicallydepicted in the drawings for purposes of explanation only and so as tonot obscure the description with details that are well known to thoseskilled in the art. Nevertheless, the attached drawings are included todescribe and explain illustrative examples of the disclosed subjectmatter. The words and phrases used herein should be understood andinterpreted to have a meaning consistent with the understanding of thosewords and phrases by those skilled in the relevant art. No specialdefinition of a term or phrase, i.e., a definition that is differentfrom the ordinary and customary meaning as understood by those skilledin the art, is intended to be implied by consistent usage of the term orphrase herein. To the extent that a term or phrase is intended to have aspecial meaning, i.e., a meaning other than that understood by skilledartisans, such a special definition is expressly set forth in thespecification in a definitional manner that directly and unequivocallyprovides the special definition for the term or phrase. Additionally,the term, “or,” as used herein, refers to a non-exclusive “or,” unlessotherwise indicated (e.g., “or else” or “or in the alternative”). Also,the various embodiments described herein are not necessarily mutuallyexclusive, as some embodiments can be combined with one or more otherembodiments to form new embodiments.

As discussed herein, conventional store queue operations can lead tosubstantial delays in performing operations such as STLF. Moreover, theconventional store queue operations may lead to linear aliasing. Thepresent application therefore describes embodiments of a unified storequeue that allows for operations such as STLF, while reducing linearaliasing problems.

FIG. 1 conceptually illustrates a computer system 100, according to someembodiments. The computer system 100 may be a personal computer, alaptop computer, a handheld computer, a netbook computer, a mobiledevice, a tablet computer, a netbook, an ultrabook, a telephone, apersonal data assistant (PDA), a server, a mainframe, a work terminal, asmart television, or the like. The computer system includes a mainstructure 105 which may be a computer motherboard, system-on-a-chip,circuit board or printed circuit board, a desktop computer enclosure ortower, a laptop computer base, a server enclosure, part of a mobiledevice, tablet, personal data assistant (PDA), or the like. The computersystem 100 may run an operating system such as Linux®, Unix®, Windows®,Mac OS®, or the like.

In some embodiments, the main structure 105 includes a graphics card120. For example, the graphics card 120 may be an ATI Radeon™ graphicscard from Advanced Micro Devices (“AMD”). The graphics card 120 may, indifferent embodiments, be connected on a Peripheral ComponentInterconnect (PCI) Bus (not shown), PCI-Express Bus (not shown), anAccelerated Graphics Port (AGP) Bus (also not shown), or otherelectronic or communicative connection. The graphics card 120 mayinclude a graphics processing unit (GPU) 125 used in processing graphicsdata. The graphics card 120 may be referred to as a circuit board or aprinted circuit board or a daughter card or the like.

The computer system 100 may comprise a processor 110, in accordance withsome embodiments, is illustrated. Modern computer systems may exist in avariety of forms, such as telephones, tablet computers, desktopcomputers, laptop computers, servers, smart televisions, or otherconsumer electronic devices. The processor unit 110 may comprise one ormore central processing units (CPUs) 140. The CPU 140 is capable ofperforming memory operations using the unified store queue taughtherein.

The CPU(s) 140 may be electronically or communicatively coupled to anorthbridge 145. The CPU 140 and northbridge 145 may be housed on themotherboard (not shown) or some other structure of the computer system100. In some embodiments, the graphics card 120 may be coupled to theCPU 140 via the northbridge 145 or some other electronic orcommunicative connection. For example, CPU 140, northbridge 145, GPU 125may be included in a single package or as part of a single die or“chip”. The northbridge 145 may be coupled to a system RAM (or DRAM) 155or the system RAM 155 may be coupled directly to the CPU 140. The systemRAM 155 may be of any RAM type known in the art; the type of system RAM155 may be a matter of design choice. The northbridge 145 may beconnected to a southbridge 150. The northbridge 145 and southbridge 150may be on the same chip in the computer system 100, or the northbridge145 and southbridge 150 may be on different chips. The southbridge 150may be connected to one or more data storage units 160. The data storageunits 160 may be hard drives, solid state drives, magnetic tape, or anyother non-transitory, writable media used for storing data. In variousembodiments, the CPU 140, northbridge 145, southbridge 150, GPU 125, orsystem RAM 155 may be a computer chip or a silicon-based computer chip,or may be part of a computer chip or a silicon-based computer chip. Thevarious components of the computer system 100 may be operatively,electrically, or physically connected or linked with a bus 195 or morethan one bus 195. Some embodiments of the buses 195 may be result busesthat are used to convey results of operations performed by onefunctional entity in the computer system 100 to another functionalentity in the computer system 100.

The computer system 100 may be connected to one or more display units170, input devices 180, output devices 185, or peripheral devices 190.These elements may be internal or external to the computer system 100,and may be wired or wirelessly connected. The display units 170 may beinternal or external monitors, television screens, handheld devicedisplays, touchscreens, and the like. The input devices 180 may be anyone of a keyboard, mouse, track-ball, stylus, mouse pad, mouse button,joystick, touchscreen, scanner or the like. The output devices 185 maybe any one of a monitor, printer, plotter, copier, or other outputdevice. The peripheral devices 190 may be any other device that can becoupled to a computer. Example peripheral devices 190 may include aCD/DVD drive capable of reading or writing to physical digital media, aUSB device, Zip Drive, external hard drive, phone or broadband modem,router/gateway, access point or the like.

The GPU 125 and the CPU 140 may implement various functional entitiesincluding one or more processor cores, floating-point units, arithmeticlogic units, load store units, translation lookaside buffers,instruction pickers, or caches such as L1, L2, or L3 level caches in acache hierarchy.

FIG. 2 conceptually illustrates an example of a semiconductor device 200that may be formed in or on a semiconductor wafer (or die), according tosome embodiments. The semiconductor device 200 may be formed in or onthe semiconductor wafer using well known processes such as deposition,growth, photolithography, etching, planarizing, polishing, annealing,and the like. Some embodiments of the device 200 include a centralprocessing unit (CPU) 205 that is configured to access instructions ordata that are stored in the main memory 210. Some embodiments of the CPU205 may be implemented as part of the CPU 140 shown in FIG. 1, the GPU125 shown in FIG. 1, or other processing elements.

The CPU 205 includes one or more CPU cores 215 that are used to executethe instructions or manipulate data. The CPU 205 also implements ahierarchical (or multilevel) cache system that is used to speed accessto the instructions or data by storing selected instructions or data inthe caches. However, persons of ordinary skill in the art having benefitof the present disclosure should appreciate that some embodiments of thedevice 200 may implement different configurations of the CPU 205, suchas configurations that use external caches. Some embodiments mayimplement different types of processors such as graphics processingunits (GPUs) or accelerated processing units (APUs) and some embodimentsmay be implemented in processing devices that include multipleprocessing units or processor cores.

The cache system shown in FIG. 2 includes a level 2 (L2) cache 220 forstoring copies of instructions or data that are stored in the mainmemory 210. Relative to the main memory 210, the L2 cache 220 may beimplemented using faster memory elements and may have lower latency. Thecache system shown in FIG. 2 also includes an L1 cache 225 for storingcopies of instructions or data that are stored in the main memory 210 orthe L2 cache 220. Relative to the L2 cache 220, the L1 cache 225 may beimplemented using faster memory elements so that information stored inthe lines of the L1 cache 225 can be retrieved quickly by the CPU 205.Some embodiments of the L1 cache 225 are separated into different level1 (L1) caches for storing instructions and data, which are referred toas the L1-I cache 230 and the L1-D cache 235. Persons of ordinary skillin the art having benefit of the present disclosure should appreciatethat the cache system shown in FIG. 2 is one example of a multi-levelhierarchical cache memory system and some embodiments may use differentmultilevel caches including elements such as L0 caches, L1 caches, L2caches, L3 caches, and the like.

The CPU core 215 can execute programs that are formed using instructionssuch as load instructions and store instructions. Some embodiments ofprograms are stored in the main memory 210 and the instructions are keptin program order, which indicates the logical order for execution of theinstructions so that the program operates correctly. For example, themain memory 210 may store instructions for a program 240 that includesthe store S1, the load L1, and another instruction D1 that may providedata to the store S1 in program order. Instructions that occur earlierin program order are referred to as “older” instructions andinstructions that occur later in program order are referred to as“younger” instructions. Persons of ordinary skill in the art havingbenefit of the present disclosure should appreciate that the program 240may also include other instructions that may be performed earlier orlater in the program order of the program 240.

Some embodiments of the CPU 205 are out-of-order processors that canexecute instructions in an order that differs from the program order ofthe instructions in the program 240. The instructions may therefore bedecoded and dispatched in program order and then issued out-of-order. Asused herein, the term “dispatch” refers to sending a decoded instructionto the appropriate unit for execution and the term “issue” refers toexecuting the instruction. The CPU 205 includes a picker 245 that isused to pick instructions for the program 240 to be executed by the CPUcore 215. For example, the picker 245 may select instructions from theprogram 240 in the order L1, S1, D1, which differs from the programorder of the program 240 because the younger load L1 is picked beforethe older store S1, which is picked before the older instruction D1.

Some embodiments of the CPU 205 implement an arithmetic logic unit (ALU)250 that is used to perform arithmetic or logical operations. Forexample, the ALU 250 may receive input from one or more registers in theCPU 205 and may be controlled to perform one or more arithmetic orlogical operations on the input and then write the results to one ormore output registers in the CPU 205. Some embodiments of the ALU 250may be used to perform operations indicated by instructions (such as theinstruction D1) and the results may be provided to a store instruction(such as the store instruction S1) for subsequent writing to one or moreof the caches 220, 225, 230, 235. Some embodiments of the CPU 205 mayimplement a floating-point unit (FPU) 255 to perform operations such asaddition, subtraction, multiplication, division, and square root, ortranscendental functions on floating point numbers. Some embodiments ofthe ALU 250 may be used to perform operations indicated by instructions(such as the instruction D1) and the results may be provided to a storeinstruction (such as the store instruction S1) for subsequent writing toone or more of the caches 220, 225, 230, 235. Some embodiments of theCPU 205 may include buses (such as the buses 195 shown in FIG. 1) forconveying results of operations between entities within the CPU 205.

The CPU 205 may implement a unified store queue (USQ) unit 280 capableof performing store operations. A load-store unit (LS) 260 may comprisea USQ unit 280 and one or more store queues 265 that are used to holdthe store instructions and associated data. The data location for eachstore instruction is indicated by a virtual address, which may betranslated into a physical address so that data can be accessed from themain memory 210 or one of the caches 220, 225, 230, 235. The CPU 205 maytherefore include a translation look aside buffer (TLB) 270 that is usedto translate virtual addresses into physical addresses. The storeinstruction may be placed in the store queue 265 to wait for data upondispatch. Entries in the store queue 265 may therefore be allocatedprior to the store instruction (such as S1) receiving a valid addresstranslation from the TLB 270 or becoming eligible for execution. Entriesin the store queue 265 include storage space for the data that is to bewritten to the physical address by the corresponding store instruction.Consequently, entries corresponding to the store instruction are able toreceive data upon dispatch and prior to receiving an addresstranslation.

The USQ unit 280 may include a tracking logic 261 and an STLF logic 262.The tracking logic 261 may comprise a dispatch pointer, which indicateswhere in the store queue new stores (in this context, store instructionswith basic information relating thereto, e.g., ROB tag and sourceregister, among others) should be inserted. (The address and data for anew store may, but need not, be known when queuing the new store). Thetracking logic 261 may also comprise a head pointer, which may indicatethe oldest store, which is the next one to perform the write operationinto memory, i.e., the next one to commit. The tracking logic 261 mayalso comprise a non-speculative pointer, which is capable of trackingwhich set of stores are speculative, which stores are not speculative.As used herein, a speculative state associated with a memory locationmay refer to state of the memory location subsequent to the execution ofone or more speculative store memory operations. A particularinstruction may be speculative if one or more branch instructions priorto the particular instruction in the program order have not yet beenexecuted, or if the instruction has not yet been retired. “Retired”refers to a store instruction that has not yet reached the head of theROB, or if it has reached the head of the ROB, attendant logic has notyet decided to retire it. “Retirement” here is distinct from“commitment.” Retired instructions are considered complete to software,whereas committed instructions are retired ones which have been writteninto cache. The non-speculative pointer may point to somewhere inbetween the dispatch pointer and the head pointer (see FIG. 3A). Thenon-speculative pointer may indicate the point that the storeinstruction has been retired and/or that all branch instructions priorto it in program order have been executed. The non-speculative pointermay indicate the point that delineates pre-retired and post-retired(i.e., retired but not yet committed) store instructions. This is usefulfor branch mispredict rollback. When a branch mispredict occurs, thenon-retired stores are flushed out, and the retired stores are keptintact.

The dispatch pointer, the head pointer, and the non-speculative pointereach may be accompanied with a wrap bit that helps indicate when thepointers wrap back to the beginning of the queue. This may reveal wherethe oldest entries are located. If a branch mispredict occurs, thedispatch pointer may become reset to the non-speculative pointer, whichother pointers remain the same.

FIG. 3A illustrates a stylized, exemplary set of USQ entries. In theexample of FIG. 3A, the head pointer is pointing to entry location 1,while the non-speculative pointer is pointing to entry location 2. Theload pointer, which indicates where in the USQ new load instructionsshould be retrieved and is dependent on the specific load instructioncurrently executing, is pointing to entry location 5, and the dispatchpointer is pointing to entry location 7. In the example of FIG. 3A,stores in location 1 through location 6 are valid, that is, no newstores would be stored in those locations. The store in location 1 isthe oldest store, as well as the only retired store, i.e., the store(s)between the head pointer and the non-speculative pointer. The stores inlocation 1 through location 4 are older than the entry in the USQpointed to by the load pointer, and the store in location 6 is youngerthan the entry in the USQ pointed to by the load pointer.

Turning back to FIG. 2, the USQ unit 280 also comprises an STLF logic262. The STLF logic 262, in one embodiment, provides for a store to loadforwarding (STLF) process that takes into account linear addressing.Upon execution of a load instruction, a target load instruction isselected. The target load instruction includes basic informationrelating thereto, and at least the linear address. If the selected loadlinear address matches an older store, data from that store is forwardedto the load, thereby allowing for completing the load earlier thanotherwise. When a load is selected for an operation, the correspondinglinear address is received by the USQ unit 280. The USQ unit 280performs a search for all stores that would match a subset of theiraddress bits (e.g., the lower order bits 15:0). The USQ unit 280 maythen perform a ranking function that would prioritize the storesaccording to age.

In order to perform the store operation, the USQ unit 280 may select theyoungest older store that matches the load and prepares thecorresponding data for forwarding. In one embodiment, a multiplexer isused to forward the data associated with the load. Subsequently, the USQunit 280 may compare the full linear address (e.g., bits 47:0) of thestore to the load. If the full linear address of the store does notfully match the load, the USQ unit 280 may cancel the forwardingprocess.

In some cases, utilization of linear addressing may result in linearaliasing. Linear aliasing may refer to a plurality of linear addressesmapping to the same physical addresses. The stores in the USQ unit 280in the LS 260 may have distinct linear addresses, but in some cases, aplurality of stores may map to the same physical location. Therefore, inthis case, there is a danger in forwarding the incorrect data for aparticular operation.

Some embodiments provide for accounting for linear aliasing. Duringexecution of a store operation, a new store in the USQ unit 280 mayreceive a physical address. The physical address is checked against allolder stores in the USQ unit 280. Any older stores that match thephysical address of the new store, but mismatch a predetermined subsetof linear address bits (e.g., bits 15:12) of the new store, are taggedwith a “poisoned” status. Some embodiments may employ comparators(physical, firmware, and/or virtual) for comparing against the physicaladdresses. Some embodiments provide for the stores to comprise“poisoned” bits that are asserted to indicate a poisoned status. Upontagging a store as having a poisoned status, that particular store wouldbe prevented from participating in a subsequent STLF process. In thismanner, a full comparator on the linear address is not needed forperforming the physical address comparisons. Further, this process mayreduce the possibility of forwarding the incorrect data for a particularoperation.

Embodiments herein provide for employing an STLF scheme in conjunctionwith linear aliasing error reduction processes described herein.Therefore, utilizing embodiments herein, instead of the processor 110having to shift queues in the LS 260, pointers are used for moreefficient memory transactions. The linear alias effect reduction scheme(or linear alias poisoning scheme) utilized in conjunction to the USQand STLF process described herein provide for more efficient design, forexample, because this approach does not require a full linear andphysical CAM. Some embodiments here avoid the use of pipeline flushes tocorrect erroneous STLF processes, but instead block (or stall executionof) the load to reduce the possibility of erroneous information beingused for memory operations. Utilizing the LS 280 embodiment describeherein, power consumption reduction may be achieved, and further,organizing the stores using the USQ Unit 280, a more centralizedapproach for handling stores using in-order queues described herein mayprovide for a more simplified and efficient design.

FIG. 3B conceptually illustrates one example of a store queue 300 suchas the store queue 265 shown in FIG. 2, according to some embodiments.The store queue 300 is configured to store entries 305 associated withstore instructions. The entries 305 include an address field (ADDR) thatincludes information indicating an address of a location for storingdata associated with the store instruction, such as a virtual addressor, if the store instruction has received an address translation, aphysical address in a memory page. The entries 305 also include spacefor holding data (DATA) that is to be written to the address indicatedin the address field upon execution of the corresponding storeinstruction. Although the DATA space is allocated to each entry 305 whenthe entry is allocated to a store instruction, data may not be initiallystored in the DATA space. For example, the store instruction may beawaiting data from an in-flight operation, as discussed herein.

Some embodiments of the entry 305 in the store queue 300 are configuredto store information indicating the source (SOURCE) of the data that isgoing to be written to the address indicated in the address field. Forexample, the source of the data may be a fixed or predetermined value(like 0) or the data may be provided by a register file or an in-flightoperation. Since the entry 305 includes space for holding the data, thedata may be written into the entry as soon as it is available. Forexample, fixed values may be entered into the DATA field immediatelyupon allocation of the entry 305. Some embodiments of the store queue300 may have one or more connections to one or more register files sothat entries 305 can access the data in the register files and writethis information into the DATA field as soon as the data is available inthe register file. Data generated by an in-flight operation may bewritten to the DATA field when execution of the in-flight operationcompletes. For example, the store queue 300 may snoop result buses andobtain data when it sees an operation complete and assert the result onthe result bus. Some embodiments of the store queue 300 may use the samestorage elements for the SOURCE and DATA fields. For example, storageelements associated with the DATA field may store information indicatingwhere the data is coming from (e.g., SOURCE information) and thisinformation may be replaced with the actual data when the data arrives.The entry 305 may also include one or more status bits, as indicated inFIG. 3B. The status bits may provide for information, such as “poisoned”stores, etc.

Some embodiments of the entries 305 may be configured to storeinformation that indicates the relative age of the entries 305. Forexample, the relative age of the entry 305 may be indicated by a pointerthat points to the next youngest or oldest entry 305, timestamps orcounters that indicate the relative ages of the entries 305, or bystoring the entries 305 in an order that indicates their relative ages.

Referring back to FIG. 2, one or more load queues 275 are implemented inthe load-store unit 260 shown in FIG. 2. Load data may be indicated byvirtual addresses and so the virtual addresses for load data may betranslated into a physical address by the TLB 270. A load instruction(such as L1) may be added to the load queue 275 on dispatch or when theload instruction is picked and receives a valid address translation fromthe TLB 270. Either the virtual or physical address of the loadinstruction may be used to check the store queue 265 for addressmatches. If an address (virtual or physical depending on the embodiment)in the store queue 265 matches the address of the data used by the loadinstruction, then store-to-load forwarding may be used to forward thedata from the store queue 265 to the load instruction in the load queue275.

Entries in the store queue 265 may be eligible to initiate STLF as soonas they have been allocated to a store instruction and received anaddress, even though the corresponding store instruction may not havereceived the data that is to be forwarded. For example, the load storeunit 260 may use indications that a source of the data is in the processof generating the data for the entry in the store queue 265 and timinginformation associated with the source, the store queue 265, or the loadqueue 275 to provide a wake-up signal from the store queue 265 to theload queue 275. For example, the results of an operation (such as theinstruction D1) performed by the ALU 250 or the FPU 255 may be providedto an entry in the store queue 265 (e.g., an entry corresponding to thestore instruction S1) and subsequently forwarded to an entry in the loadqueue 275 such as an entry corresponding to the load instruction L1. Awake-up signal may therefore be provided from the store queue 265 to theload queue 275 in response to the operation being scheduled forexecution. The load queue 275 may use the wake-up signal to scheduleexecution of the load instruction. Scheduling of the load instructionmay be timed so that the data is available for forwarding from the storequeue 265 when needed by the load instruction. Some embodiments of thesystem may include separate buses for carrying the result data andscheduling information between the ALU 250, the FPU 255, the store queue265, and the load queue 275.

FIG. 4 conceptually illustrates an example of a computer system 400 thatincludes result buses 405 and scheduling buses 410, according to someembodiments. The result buses 405 and the scheduling buses 410 shown inFIG. 4 may be used to convey result data or scheduling information,respectively, between elements in the computer system 400 such as an ALU415, an FPU 420, a store queue 425, or a load queue 430. Embodiments ofthe ALU 415, the FPU 420, the store queue 425, or the load queue 430 maybe implemented in some embodiments of the device 200 depicted in FIG. 2.In some embodiments, the store queue 425 may be part of the USQ unit280, and in other embodiments, the store queue 425 may be separate fromthe USQ unit 280. The store queue 425 can monitor the schedule buses 410to determine when operations associated with store instructions in thestore queue 425 have been scheduled for operation. The store queue 425can then initiate the wake-up process for STLF in response to detectinga signal indicating that an associated operation has been scheduled. Forexample, the ALU 415 or the FPU 420 may provide a signal to the schedulebuses 410(1-2) in response to scheduling execution of operations. Foranother example, the load queue 430 may provide a signal to the schedulebus 410(3) when a load instruction is scheduled to be executed.

The store queue 425 may detect signals on one of the schedule buses 410and may provide a wake-up signal to the load queue 430 if the signal onone or more of the schedule buses 410 indicates scheduling of anoperation or load instruction that provides results that are used by astore instruction that is eligible for STLF. The load queue 430 mayreceive the wake-up signal and respond to the wake-up signal byscheduling execution of the load instruction, as discussed herein. Oncethe operation or instruction has completed execution, the ALU 415, FPU420, or load queue 430 can provide results of the operation to theresult buses 405. The store queue 425 may read the results from theresult buses 405 and store them in the corresponding entry. Data fromthe entry in the store queue 425 may then be forwarded to the load queue430 using STLF.

Turning now to FIG. 5 and FIG. 6A, in some embodiments, the processor110 comprising a CPU 140 may reside on a silicon die/chip 640. Thesilicon die/chip 640 may be housed on a motherboard or other structureof the computer system 100. In some embodiments, there may be more thanone processor 110 on each silicon die/chip 640. Some embodiments of theprocessor 110 may be used in a wide variety of electronic devices.

Turning now to FIG. 6B, in accordance with some embodiments, and asdescribed above, the processor 110 may be included on the siliconchip/die 640. The silicon chip/die 640 may contain one or more differentconfigurations of the processor 110. The silicon chip/die 640 may beproduced on a silicon wafer 630 in a fabrication facility (or “fab”)690. That is, the silicon wafer 630 and the silicon die/chip 640 may bereferred to as the output, or product of, the fab 690. The siliconchip/die may be used in electronic devices.

The circuits described herein may be formed on a semiconductor materialby any known means in the art. Forming may be done, for example, bygrowing or deposition, or by any other means known in the art. Differentkinds of hardware descriptive languages (HDL) may be used in the processof designing and manufacturing the microcircuit devices. Examplesinclude VHDL and Verilog/Verilog-XL. In some embodiments, the HDL code(e.g., register transfer level (RTL) code/data) may be used to generateGDS data, GDSII data and the like. GDSII data, for example, is adescriptive file format and may be used in some embodiments to representa three-dimensional model of a semiconductor product or device. Suchmodels may be used by semiconductor manufacturing facilities to createsemiconductor products and/or devices. The GDSII data may be stored as adatabase or other program storage structure. This data may also bestored on a computer readable storage device (e.g., data storage units,RAMs, compact discs, DVDs, solid state storage and the like) and, insome embodiments, may be used to configure a manufacturing facility(e.g., through the use of mask works) to create devices capable ofembodying various aspects of some embodiments. As understood by one orordinary skill in the art, this data may be programmed into a computer,processor, or controller, which may then control, in whole or part, theoperation of a semiconductor manufacturing facility (or fab) to createsemiconductor products and devices. In other words, some embodimentsrelate to a non-transitory computer-readable medium storing instructionsexecutable by at least one processor to fabricate an integrated circuit.These tools may be used to construct the embodiments described herein.

Turning now to FIG. 7, a flowchart of a method 700 relating to an STLFprocedure, in accordance with some embodiments, is illustrated. The CPU140 may select (at 710) a load instruction to be executed. Variousinformation, such as load linear address, the USQ pointer information,etc. is sent (at 720) to the store queue 265. Each store instruction inthe store queue then compares at least a subset of its linear addressbits (e.g., 15:0) to the linear address of the load instruction (at730). A determination is made whether there exists a store instructionhaving a linear address that substantially matches the linear address(at 750). If a determination is made that no store instructionsubstantially matches the subset of the linear address, the STLFprocedure may then be canceled (at 760) for the current operation.

If a determination is made that a store instruction fully matches thesubset of the linear address (at 750), the full linear address of thatstore instruction is compared to that of the load instruction (at 765).If a determination is made that the full linear address does not match(at 770) that of the current store instruction, the STLF procedure iscanceled (at 785). However, if a determination is made that the linearaddress substantially matches (at 770) that of the current storeinstruction, the status bit(s) (e.g., the non-forwardable or poison bit)may be checked (at 775). A determination is made whether the statusbit(s) are not indicating a linear alias or partial linear addressmismatch with the store instruction (e.g., the non-forwardable bitand/or the poisoned bit is not set) (at 780). In some embodiments, otherstatus bits may be checked also, in order to make a final STLFdetermination. If there is a linear alias and/or partial linear addressmismatch with the store instruction indicated by at least one of thestatus bits, the STLF process is canceled for the current operation (at785). If the status bits show no problems, then the data from the storeis forwarded to the load (at 790).

Turning now to FIG. 8, a flowchart of a method 800 relating to a storeoperation for reducing or solving a linear aliasing effect, inaccordance with some embodiments, is illustrated. Implementation of thelinear addressing process described herein may result in linearaliasing. Linear aliasing generally refers to multiple linear addressesthat map to the same physical address. As a result, errors can occurwhen forwarding data to a load because incorrect data may be forwardedfor an operation. Some embodiments herein provide for reducing linearaliasing. During execution of a store operation, a new store in the USQunit 280 may receive a physical address (at 810). The physical addressmay be received from a translation look-aside buffer (TLB), a table inthe processor's 110 memory that contains information about pages inmemory. The new store's physical address as well as a subset of thelinear address (e.g. bits 15:12) are broadcast (at 820) to the storequeue 265 of the LS 260 for comparison.

As a result of the broadcast of block 820, for each store in the storequeue 265, the method 800 may compare (at 830) its physical address anda subset of its linear address to the corresponding physical address andlinear address subset of the new store. A determination is made whetherthere is a physical address match and a partial linear address (e.g.,bits 15:12) match (at 840). If there is a physical address match, but nopartial linear address (e.g., bits 15:12) mismatch, an indication isprovided (at 850) that there is no linear aliasing effect as a result ofthe current operation. Further, if there is physical address mismatch,it is an indication that there is no linear alias effect as a result ofthe current operation.

Upon a determination that there is a physical address match and a linearaddress subset (e.g., bits 15:12) mismatch, a signal indicating that alinear alias has been detected is asserted (at 860). Some embodimentsprovide for a “poisoned” status bit to set, indicating that a linearalias has been detected. Some embodiments provide for asserting (at 870)a “not-forwardable” status bit of the store in the store queue toindicate that the physical address matches and the partial linearaddress mismatches the new store. Moreover, the “not-forwardable” statusbit of the new store is also set (at 880), thereby reducing thepossibility of an error caused by linear aliasing.

Embodiments herein provide for employing an STLF scheme in conjunctionwith linear aliasing error reduction processes described herein.Utilizing the LS 260 embodiment described herein, power consumptionreduction may be achieved, and further, organizing the stores using theUSQ Unit 280, a more centralized approach for handling stores usingin-order queues described herein provide for a more simplified andefficient process.

Embodiments of processor systems that can allocate a store queue entriesto store instructions for the USQ and STLF processes described herein(such as the processor system 100) may be fabricated in semiconductorfabrication facilities according to various processor designs. In oneembodiment, a processor design can be represented as code stored on acomputer readable media. Exemplary codes that may be used to defineand/or represent the processor design may include HDL, Verilog, and thelike. The code may be written by engineers, synthesized by otherprocessing devices, and used to generate an intermediate representationof the processor design, e.g., netlists, GDSII data and the like. Theintermediate representation can be stored on computer readable media andused to configure and control a manufacturing/fabrication process thatis performed in a semiconductor fabrication facility. The semiconductorfabrication facility may include processing tools for performingdeposition, photolithography, etching, polishing/planarizing, metrology,and other processes that are used to form transistors and othercircuitry on semiconductor substrates. The processing tools can beconfigured and are operated using the intermediate representation, e.g.,through the use of mask works generated from GDSII data.

Portions of the disclosed subject matter and corresponding detaileddescription are presented in terms of software, or algorithms andsymbolic representations of operations on data bits within a computermemory. These descriptions and representations are the ones by whichthose of ordinary skill in the art effectively convey the substance oftheir work to others of ordinary skill in the art. An algorithm, as theterm is used here, and as it is used generally, is conceived to be aself-consistent sequence of steps leading to a desired result. The stepsare those requiring physical manipulations of physical quantities.Usually, though not necessarily, these quantities take the form ofoptical, electrical, or magnetic signals capable of being stored,transferred, combined, compared, and otherwise manipulated. It hasproven convenient at times, principally for reasons of common usage, torefer to these signals as bits, values, elements, symbols, characters,terms, numbers, or the like.

It should be borne in mind, however, that all of these and similar termsare to be associated with the appropriate physical quantities and aremerely convenient labels applied to these quantities. Unlessspecifically stated otherwise, or as is apparent from the discussion,terms such as “processing” or “computing” or “calculating” or“determining” or “displaying” or the like, refer to the action andprocesses of a computer system, or similar electronic computing device,that manipulates and transforms data represented as physical, electronicquantities within the computer system's registers and memories intoother data similarly represented as physical quantities within thecomputer system memories or registers or other such information storage,transmission or display devices.

Note also that the software implemented aspects of the disclosed subjectmatter are typically encoded on some form of program storage medium orimplemented over some type of transmission medium. The program storagemedium may be magnetic (e.g., a floppy disk or a hard drive) or optical(e.g., a compact disk read only memory, or “CD ROM”), and may be readonly or random access. Similarly, the transmission medium may be twistedwire pairs, coaxial cable, optical fiber, or some other suitabletransmission medium known to the art. The disclosed subject matter isnot limited by these aspects of any given implementation.

Furthermore, the methods disclosed herein may be governed byinstructions that are stored in a non-transitory computer readablestorage medium and that are executed by at least one processor of acomputer system. Each of the operations of the methods may correspond toinstructions stored in a non-transitory computer memory or computerreadable storage medium. In various embodiments, the non-transitorycomputer readable storage medium includes a magnetic or optical diskstorage device, solid state storage devices such as Flash memory, orother non-volatile memory device or devices. The computer readableinstructions stored on the non-transitory computer readable storagemedium may be in source code, assembly language code, object code, orother instruction format that is interpreted and/or executable by one ormore processors.

The particular embodiments disclosed above are illustrative only, as thedisclosed subject matter may be modified and practiced in different butequivalent manners apparent to those skilled in the art having thebenefit of the teachings herein. Furthermore, no limitations areintended to the details of construction or design herein shown, otherthan as described in the claims below. It is therefore evident that theparticular embodiments disclosed above may be altered or modified andall such variations are considered within the scope of the disclosedsubject matter. Accordingly, the protection sought herein is as setforth in the claims below.

What is claimed:
 1. A method, comprising: selecting a load instructionfor execution; determining whether a full linear address of the loadinstruction matches a full linear address of one store instruction of aplurality of store instructions in a store queue; responsive todetermining an address match between the full linear address of the loadinstruction and the full linear address of one store instruction of theplurality of store instructions, checking whether one or more statusbits of the one store instruction are set, wherein the one or morestatus bits are set upon a determination of a physical address match anda partial linear address mismatch between the one store instruction anda second store instruction of the plurality of store instructions; andforwarding, in response to determining that the one or more status bitsof the one store instruction are not set, data associated with the onestore instruction of the plurality of store instructions to completeexecution of the load instruction.
 2. The method of claim 1, furthercomprising: preventing the forwarding of data in response to determiningthat no address match exists between the full linear address of the loadinstruction and the plurality of store instructions.
 3. The method ofclaim 2, further comprising: blocking data associated with the one storeinstruction from being forwarded to the load instruction, in response todetermining that the one or more status bits of the one storeinstruction are set.
 4. The method of claim 3, wherein the one or morestatus bits indicate at least one of a linear alias or a partial linearaddress mismatch with a store instruction.
 5. The method of claim 2,wherein determining whether the full linear address of the loadinstruction matches the full linear address of the one store instructionis in response to a determination that a subset of the full linearaddress of the one store instruction matches a subset of the full linearaddress of the load instruction.
 6. The method of claim 5, furthercomprising: comparing the subset of the full linear address of the loadinstruction to a subset of a full linear address of each storeinstruction of the plurality of store instructions to determine whethera match exists; and canceling forwarding of data associated with the onestore instruction to a corresponding load instruction responsive to adetermination that no match exists.
 7. A method, comprising broadcastinga physical address and a subset of a linear address of a new storeinstruction to a store queue comprising a plurality of storeinstructions; comparing a physical address and a subset of a linearaddress of each store instruction of the plurality of store instructionsin the store queue to the physical address and the subset of the linearaddress of the new store instruction; asserting a signal in the storequeue indicating detection of a linear alias, in response to thephysical address of the new store instruction matching the physicaladdress of one store instruction of the plurality of store instructionsin the store queue, and the subset of the linear address of the newstore instruction not matching the subset of the linear address of theone store instruction, wherein asserting the signal comprises assertinga status bit indicating a poisoned status; and preventing forwarding ofdata from the one store instruction in response to the asserted statusbit.
 8. The method of claim 7, wherein broadcasting the subset of thelinear address of the new store instruction comprises broadcastingaddress bits 15:12.
 9. The method of claim 7, further comprisingindicating that the linear alias is not present in response to adetermination that the physical address of the new store instructionfails to match the physical address of the one store instruction.
 10. Amethod, comprising: providing a first linear address of a loadinstruction for execution to a unified store queue (USQ) comprising aplurality of store instructions; comparing a first subset of the firstlinear address to corresponding subsets of linear addresses of theplurality of store instructions in the USQ, to determine a matchingstore instruction; comparing a full address of the first linear addressto a full linear address of the matching store instruction to determinewhether a full matching store instruction is present, in response to adetermination of the matching store instruction; comparing a physicaladdress of the matching store instruction to a physical address of eachof the plurality of store instructions of the USQ to determine whetherthere is a physical address match; comparing a second subset of thelinear address of the matching store instruction to a correspondingsubset of the linear address of each of a plurality of storeinstructions of the USQ to determine whether there is a partial linearaddress mismatch; determining that a linear alias is present in responseto a determination that there is a physical address match and a partiallinear address mismatch; and blocking data from the matching storeinstruction from being used for executing the load instruction inresponse to a determination that the linear alias is present.
 11. Themethod of claim 10, further comprising allowing, based on adetermination the linear alias is not present, data from the matchingstore instruction to be used for executing the load instruction.
 12. Themethod of claim 10, wherein: comparing the first subset of the firstlinear address comprises comparing bits 15:0 of the first linearaddress; and comparing the second subset of the linear address of thematching store instruction comprises comparing bits 15:12 of the linearaddress of the matching store instruction.
 13. The method of claim 10,further comprising asserting a status bit in the USQ indicating anot-forwardable status for blocking usage of data from the matchingstore instruction from being used for execution of the load instruction,in response to a determination that the linear alias is present.
 14. Themethod of claim 10, further comprising forwarding data for execution ofthe load instruction, in response to a determination that a plurality ofstore instructions do not map to a single physical memory location. 15.The method of claim 10, further comprising: checking a status bit of thematching store instruction; determining whether the status bit indicatesa non-forwardable state; blocking data relating to the matching storeinstruction from being forwarded for execution, in response to adetermination that the status bit indicates the non-forwardable state.16. An integrated circuit device, comprising: a hardware processor forexecuting an instruction; a queue unit in communication with theprocessor that is configured to: receive a linear address for a load toexecute the instruction; compare a subset of the linear address to asubset of a linear address of each store instruction of a plurality ofstore instructions located in the queue unit to determine whether thereis a match; responsive to determining a match, determine whether thelinear address for the load matches the linear address of a storeinstruction of the plurality of store instructions in the queue unit;responsive to determining an address match, check whether one or morestatus bits of the store instruction are set, wherein the one or morestatus bits are set upon a determination of a physical address match anda partial linear address mismatch between the store instruction and adifferent store instruction of the plurality of store instructions; andforward data associated with the store instruction of the plurality ofstore instructions for executing the instruction in response todetermining that the one or more status bits of the store instructionare not set.
 17. The integrated circuit device of claim 16, furthercomprising: a linear aliasing logic configured to determine whether aplurality of linear addresses in the queue unit map to a single physicaladdress, and to block forwarding of data associated with storeinstructions containing linear addresses that map into the singlephysical address.
 18. The integrated circuit device of claim 16, whereinthe queue unit further comprises: a dispatch pointer configured toinsert a new store instruction into the queue unit; a load pointerconfigured to indicate a location in the queue unit to retrieve new loadinstructions; a head pointer configured to indicate an oldest storeinstruction in the queue unit; and a non-speculative pointer configuredto indicate which of the plurality of store instructions arespeculative.
 19. A non-transitory computer-readable medium storinginstructions, which when executed by at least one processor, perform amethod comprising: selecting a load instruction for execution, whereinthe load instruction is associated with a full linear address;determining whether a subset of the full linear address associated withthe load instruction matches a subset of a linear address of a storeinstruction of a plurality of store instructions in a unified storequeue (USQ), wherein the plurality of store instructions is queued in anorder determined by a pointer associated with the USQ, and wherein thesubset of the full linear address is a number of bits that is fewer thana number of bits of the full linear address; responsive to determiningthat an address match exists between the subset of the full linearaddress of the load instruction and the subset of the linear address ofthe store instruction, and a match exists between the full linearaddress of a load instruction and a full linear address of the storeinstruction, checking whether one or more status bits of the storeinstruction is set, wherein the one or more status bits are set upon adetermination of a physical address match and a partial linear addressmismatch between the store instruction and a different store instructionof the plurality of store instructions; and canceling forwarding of dataassociated with the store instruction of the plurality of storeinstructions to complete execution of the selected load instruction inresponse to determining that the one or more status bits of the storeinstruction is set.
 20. The non-transitory computer-readable medium ofclaim 19, the method further comprising: broadcasting a physical addressand a subset of a linear address of a new store instruction to the USQ;comparing a physical address and a subset of a linear address of eachstore instruction of the plurality of store instructions in the USQ tothe physical address and the subset of the linear address of the newstore instruction; and asserting a signal indicating detection of alinear alias, in response to the physical address of the new storeinstruction matching the physical address of the store instruction ofthe plurality of store instructions in the USQ, and the subset of thelinear address of the new store instruction not matching the subset ofthe linear address of the store instruction.